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THE MINIMUM BISECTION IN THE PLANTED BISECTION MODEL 


AMIN COJA-OGHLAN* *, OLIVER COOLEY**, MIHYUN KANG** AND KATHRIN SKUBCH 


ABSTRACT. In the planted bisection model a random graph G(n,p+,p—) with n vertices is created by partitioning the 
vertices randomly into two classes of equal size (up to =bl). Any two vertices that belong to the same class are linked 
by an edge with probability p ~(_ and any two that belong to different classes with probability p— < independently. The 
planted bisection model has been used extensively to benchmark graph partitioning algorithms. If p± = 2 d± /n for numbers 
0 < d— < d+ that remain fixed as n —> oo, then w.h.p. the “planted” bisection (the one used to construct the graph) will 
not be a minimum bisection. In this paper we derive an asymptotic formula for the minimum bisection width under the 
assumption that d+ — d— > CyJd+ In d_|_ for a certain constant c > 0. 

Mathematics Subject Classification: 05C80 (primary), 05C15 (secondary) 


1. Introduction 

1.1. Background and motivation. Since the early days of computational complexity graph partitioning problems 
have played a central role in computer science GUGSl. Over the years they have inspired some of the most important 
algorithmic techniques that we have at our disposal today, such as network flows or semidefinite programming Still 

GO] (261 |39l. 

In the context of the probabilistic analysis of algorithms, it is hard to think of a more intensely studied problem than 
the planted bisection model. In this model a random graph G = G(n,p+i,p_i) on [n] = {1,..., n} is created by 
choosing a map er : V —> {—1,1} uniformly at random subject to ||cr _ 1 (l)| — |cr^ 1 (— 1)|| < 1 and connecting any 
two vertices v ^ w with probability Pa-(v)cr(w) independently, where 0 < p- 1 < p+ 1 < 1. To ease notation, we often 
write p + for p + \ and p_ for p_ i, and handle subscripts similarly for other parameters. 

Given the random graph G (but not the planted bisection er), the task is to find a minimum bisection of G, i.e., to 
partition the vertices into two disjoint sets S, S = [rz.] \ whose sizes satisfy ||Sj — |5|| < 1 such that the number 
of S'-S'-edges is minimum. The planted bisection model has been employed to gauge algorithms based on spectral, 
semidefinite programming, flow and local search techniques, to name but a few 0 0131 E GH M El M G3 Gil 

GUEQlEa. 

Remarkably, for a long time the algorithm with the widest range of n, p± for which a minimum bisection can be 
found efficiently was one of the earliest ones, namely Boppana’s spectral algorithm {6j. It succeeds if 

n{p + — P-) > C\Jnp+ hi n 

for a certain constant c > 0. Under this assumption the planted bisection is minimum w.h.p. In fact, recently the 
critical value c* > 0 for which this statement is true was identified explicitly (37l . In particular, for n(p + — p_) > 
c* yf np+ In n the minimum bisection width simply equals + o(l))n 2 p- w.h.p. 

But if n(p + — p_) < c* yjnp+ In n, then the minimum bisection width will be strictly smaller than the width of 
the planted bisection w.h.p. Yet there is another spectral algorithm J! that finds a minimum bisection w.h.p. under the 
weaker assumption that 

n(p+ -P-) > cyjnp + \n(np + ), ( 1 . 1 ) 

for a certain constant c > 0, and even certifies the optimality of its solution. However, 0 does not answer what is 
arguably the most immediate question: what is the typical value of the minimum bisection width? 

In this paper we derive the value to which the (suitably scaled) minimum bisection width converges in probability. 
We confine ourselves to the case that ^p± = d± remain fixed as n —> oo. Hence, the random graph G has bounded 
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average degree. This is arguably the most interesting case because the discrepancy between the planted and the 
minimum bisection gets larger as the graphs get sparser. In fact, it is easy to see that in the case of fixed ^p± = d± the 
difference between the planted and the minimum bisection width is 0(n) as the planted bisection is not even locally 
optimal w.h.p. 

Although we build upon some of the insights from it seems difficult to prove our main result by tracing the 
fairly complicated algorithm from that paper. Instead, our main tool is an elegant message passing algorithm called 
Warning Propagation that plays an important role in the study of random constraint satisfaction problems via ideas 
from statistical physics ED. Running Warning Propagation on G naturally corresponds to a fixed point problem on 
the 2-simplex, and the minimum bisection width can be cast as a function of the fixed point. 


1.2. The main result. To state the fixed point problem, we consider the functions 

( 1 if at < — 1 

^ : R —>■ R, xi —> < a; if — 1 < x < 1 : R —>■ R, x i—> 

[l if at > 1, 

Let V({— 1,0,1}) be the set of probability measures on {—1, 0,1}. Clearly, we can identify Pd—1,0,1}) with the 
set of all maps p : {—1,0,1} —> [0,1] such that p{— 1) + p( 0) + p(l) = 1, i.e., the 2-simplex. Further, let us define a 
map 

T d+ , d _ : Pd-1,0,1}) -> V({-1, 0,1}) (1.2) 

as follows. Given p £ P({— 1,0,1}), let (r/ Pi i)i>i be a family of i.i.d. {—1, 0, l}-valued random variables with 
distribution p. Moreover, let y± = Po (d± ) be Poisson variables that are independent of each other and of the rj pi . Let 

7+ 7++T- 

Zp td+ , d _ •= ^ ' Vp,i 'y ' Vp,i- ( 1 - 3 ) 

2=1 2=7_|_ + 1 


— 1 if x < — 1 
1 if x > —1. 


Then we let T d+:dl _ (p) £ P({—1,0,1}) be the distribution of i/>(Z Pid+!d _). Further, with (rj pd )i>i and j± as before, 
let 


<Pd+,d- :7>({-l,0,l}) 


-E 


7+ 


7++7- 


Y l { r ip d = -${Zp, d+ , d -)} + Y i { r ip^ = ^( z p,d+, d -)} 

2=1 2 = 7 _|_ + 1 


Moreover, let us call p £ V({— 1,0,1}) skewed if p(l) > 1 — d + 10 . Finally, we denote the minimum bisection width 
of a graph G by bis(G). 


Theorem 1.1. There exists a constant c > 0 such for any d± > 0 satisfying d+ — d- > c^J d+ In d+ the map T d +, d _ 
has a unique skewed fixed point p* and n -1 bis(G) converges in probability to <p d+ld _ ( p *). 

In the following sections we will use that the assumptions of Theorem 11.11 allow us to assume that also d + is 
sufficiently large. 


1.3. Further related work. Determining the minimum bisection width of a graph is NP-hard ESI and there is ev¬ 
idence that the problem does not even admit a PTAS li27l . On the positive side, it is possible to approximate the 
minimum bisection width within a factor of O(In n) for graphs on n vertices in polynomial time Il39ll . 

The planted bisection model has been studied in statistics under the name “stochastic block model” GD- However, 
in the context of statistical inference the aim is to recover the planted partition er as best as possible given G rather 
than to determine the minimum bisection width. Recently there has been a lot of progress, much of it inspired by 
non-rigorous work E2, on the statistical inference problem. The current status of the problem is that matching upper 
and lower bounds are known for the values of d± for which it is possible to obtain a partition that is non-trivially 
correlated with cr 033] [35] [36]. Furthermore, there are algorithms that recover a best possible approximation to cr 
under certain conditions on d± El HO ED- But since our objective is different, the methods employed in the present 
paper are somewhat different and, indeed, rather simpler. 

Finally, there has been recent progress on determining the minimum bisection width on the Erdos-Renyi random 
graph. Although its precise asymptotics remain unknown in the case of bounded average degrees d, it was proved 
in fl3l that the main correction term corresponds to the “Parisi formula” in the Sherrington-Kirkpartrick model |40l . 
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Additionally, regarding the case of very sparse random graphs, there is a sharp threshold for the minimum bisection 
width to be linear in n 1291 . 

Generally speaking, the approach that we pursue is somewhat related to the notion of “local weak convergence” 
of graph sequences as it was used in 0. More specifically, we are going to argue that the minimum bisection width 
of G is governed by the “limiting local structure” of the graph, which is a two-type Gabon-Watson tree. The fixed 
point problem in Theorem 11.11 mirrors the execution of a message passing algorithm on the Galton-Watson tree. The 
study of this fixed point problem, for which we use the contraction method l38l , is the key technical ingredient of our 
proof. We believe that this strategy provides an elegant framework for tackling many other problems in the theory of 
random graphs as well. In fact, in a recent paper ED we combined Warning Propagation with a fixed point analysis on 
Galton-Watson trees to the k-core problem and in a Warning Propagation was applied to the random graph coloring 
problem. 


2. Outline 

From here on we keep the notation and the assumptions of Theorem \1.1\ In particular, we assume that d+ — d _ > 
CyJ d+ In d + for a large enough constant c > 0 and that d± remain fixed as n —> oo. Furthermore we assume that 
d + is bounded from below by a large enough constant. Throughout the paper all graphs will be locally finite and of 
countable size. 


Three main insights enable the proof of Theorem l 1.11 The first one, which we borrow from a , is that w.h.p. G features 
a fairly large set C of vertices such that for any two optimal bisections it, T 2 of G (i.e. maps n, 72 : V (G) — »■ {±1}), 
we either have t\(v) = 72 (u) for all v £ C or Ti(v) = —T 2 (v) for all v £ C. In the language of random constraint 
satisfaction problems, the vertices in C are “frozen”. While there remain O(n) unfrozen vertices, the subgraph that 
they induce is subcritical, i.e., all components are of size O(lnn) and indeed most are of bounded size. 

The second main ingredient is an efficient message passing algorithm called Warning Propagation, (cf. EH Chap¬ 
ter 19]). We will show that a bounded number of Warning Propagation iterations suffice to arrange almost all of the 
unfrozen vertices optimally and thus to obtain a very good approximation to the minimum bisection w.h.p. (Proposi- 
tion l2.2l) . This insight reduces our task to tracing Warning Propagation for a bounded number of rounds. 

This last problem can be solved by studying Warning Propagation on a suitable Galton-Watson tree, because G 
only contains a negligible number of short cycles w.h.p. (Lemma [2.3I >. Thus, the analysis of Warning Propagation on 
the random tree is the third main ingredient of the proof. This task will turn out to be equivalent to studying the fixed 
point problem from Section [L2l (Proposition [2T5l ). We proceed to outline the three main components of the proof. 


2.1. The core. Given a vertex u of a graph G let <)cu denote the neighbourhood of u in G. We sometimes omit the 
subscript G when the graph is clear from the context. More particularly, in the random graph G, let d±u denote the 
set of all neighbours w of u in G with cr(w)cr(v) = ±1. Following 0, we define C as the largest subset U C [n] such 
that 

| \d±u\ — d± | < - \/d+ In d+ and \du \ U\ < 100 for all u £ U. (2.1) 

Clearly, the set C, which we call the core, is uniquely defined because any union of sets U that satisfy (12.11) also has 

the property. Let <x c : C —> {±1}, v cr(v) be the restriction of the “planted assignment” to C. 

Furthermore, for a graph G, a set U C V ( G ) and a map o : U —> {—1,1} we let 


cut(G, o) := min 


E 

{v,w}EE(G) 


1 — t(v)t(w ) 
2 


r : V{G) —> {±1} satisfies t(v) 


cr(v) for all v £ 



In words, cut(G, a) is the smallest number of edges in a cut of G that separates the vertices in U D er -1 (—1) from 
those in U D ct - 1 (1). In particular, cut(G, oc) is the smallest cut of G that separates the vertices in the core C that are 
frozen to —1 from those that are frozen to 1. 

Finally, for any vertex v we define a set C v = C V (G, a) of vertices via the following process. 

Cl: Let d 0) = {u} U d G v. 

C2: Inductively, let c£ t+1 ^ = Cv' 1 U U„ gC (*)\ C 9 G u and let C v = (Jt>o Cv' > ■ 
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Lemma 2.1 (f9)). We have bis(G) = cut(G, cre) and \C\ > n( 1 — d + 100 ) w.h.p. Furthermore, for any e > 0 there 
exists uj > 0 such that w.h.p. X^ 0 g[n] |C„| • 1 {\C V \ > w} < en. 

2.2. Warning Propagation. To calculate cut(G, cre) we adopt the Warning Propagation (“WP”) message passing 
algorithn[]. Let us first introduce WP for a generic graph G = (V(G),E(G)) and a map a : U C V (G) —> {—1,1}. 
At each time t > 0, WP sends a “message” p v ^ w {t |G, cr) £ {—1,0,1} from v to w for any edge {u, w} £ E(G). 
The messages are directed objects, i.e., p v ^ w (t\G, o) and p w ^ v (t\G, a) may differ. They are defined inductively by 

y ~ pu~>v (^|G, (jj 

u£dv\w 

Thus, the WP messages are initialised according to cr. Subsequently, v sends message ±1 to w if it receives more ±1 
than =Fl messages from its neighbours u f w. If there is a tie, v sends out 0. Finally, for t > 0 define 

p v (t\G, a) := ^2 Pw^v(t\G,a). 
w€dv 



H'V—fW •- 


cr(v) if v £ U, 
0 otherwise, 


Pv—yw (^T 1|G, cr) . — ft 


Proposition 2.2. For any e > 0 there exists to = to(e, d + , d_) such that for all t > to w.h.p. 

1 


cut(G,er c ) - - l{pw-yv(t\G,(r) = (p v (t\G,a))} 

vG[n] w^dcv 

We defer the proof of Proposition ^. 2l to Section[3] 


< en. 


2.3. The local structure. Proposition ^. 21 shows that w.h.p. in order to approximate cut(G, cre) U P to a small error 
of en we merely need to run WP for a number t 0 of rounds that is bounded in terms of e. The upshot is that the 
WP messages p w ^ v {t |G, cr) that are required to figure out the minimum bisection width are determined by the local 
structure of G. We show that the local structure of G “converges to” a suitable Galton-Watson tree. For this purpose, 
for simplicity we always say that the number of potential neighbours of any vertex in each class is n/2. This ignores 
the fact that if n is odd the classes do not have quite this size and the fact that a vertex cannot be adjacent to itself. 
However, ignoring these difficulties will not affect our calculations in any significant way. 

Our task boils down to studying WP on that Galton-Watson tree. Specifically, let T = Td + ,d_ be the Galton- 
Watson tree with two types +1,-1 and offspring matrix 


(Po{d+) Po(cL)\ 
VPo(d-) Po (d + )J- 


(2.3) 


Hence, a vertex of type ±1 spawns Po(d+) vertices of type ±1 and independently Po(eL) vertices of type q=l. 
Moreover, the type of the root vertex rx is chosen uniformly at random. Let r = : V(T) —> {±1} assign 

each vertex of T its type. 

The random graph (G, cr) “converges to” (T, t) in the following sense. For two triples (G, r, cr), (G', r', a') of 
graphs G, G', root vertices r £ V(G), r’ £ V(G') and maps a : V(G) —> {±1}, cr' : V(G') —> {±1} we write 
(G, cr) = (G', cr') if there is a graph isomorphism ip : G —> G’ such that ip(r) = r' and cr = o’ op. Further, we denote 
by d t (G, r, cr) the rooted graph obtained from (G, r) by deleting all vertices at distance greater than t from r together 
with the restriction of cr to this subgraph. The following lemma characterises the local structure of (G, cr). 


Lemma 2.3. Let t > 0 be an integer and let T be any tree with root r and map t : V ( T ) —> {±1}. Then 

— ^2 L {^*(^7 v i <T ) — d*(T, r, t)} n P [ d l (T , rx, r) = r, r)] in probability. 

v£[n\ 

Furthermore, w.h.p. G does not contain more than In n vertices v such that d t (G,v,cr) contains a cycle. 


Proof. Given a tree T with root r and map r : V (T) —> {±1}, let 

X t =X t (T,r,r) = - V 1 {d\G,v,<r) ^ d^T^r)} 
n 

v£[n\ 


discussion of Warning Propagation in the context of the “cavity method” from statistical physics can be found in ED 
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and 


Pt =pt(T,r,r) = P [^(r.rr.r) = 5‘(T,r,r)] . 


The proof proceeds by induction on t. If t = 0, pick a vertex v £ [n] uniformly at random, then X u = ¥,, ( cr(v ) = r(r)) 
i and po = P t = r(r)) = 1 for any r(r) £ {±1}. To proceed from t, to t + 1, let d denote the number 

of children tq ,..., Vd of r in T. For each i = l,... ,d, let 1) denote the tree rooted at v, in the forest obtained 
from T by removing r and let : VCl',) —> {±1} denote the restriction of r to the vertex set of 1). Finally, let 
C\,.... Ci for some d < d denote the distinct isomorphism classes among Vi,Ti) : i = 1,..., d}, and let 

Cj = |{* : d t (T i ,v i ,T i ) £ Cj}\. Let v £ [n] be an arbitrary vertex in G. Our aim is to determine the probability 
of the event {d t+1 (G, v, cr) = d i+1 (T, r, r)}. Therefore, we think of G as being created in three rounds. First, 
partition [n] in two classes. Second, randomly insert edges between vertices in [n] \ {?;} according to their planted 
sign. Finally, reveal the neighbours of v. For the above event to happen, v must have d neighbours in G. Since \0±v\ 
are independent binomially distributed random variables with parameters C and p± and because ^p± = d±, we may 
approximate |<9±i>| with a poisson distribution, and v has degree d with probability 


(i d+ + d-) d 
d\ exp(d + + d-) 


+ o(l). 


Conditioned on v having degree d, by induction v is adjacent to precisely c :j vertices with neighbourhood isomorphic 
to d t {Ti , Vi, Ti) £ Cj with probability 


Ci 


n^)+°( 1 )- 

3 = 1 

The number of cycles of length l< 2t + 3 in G is stochastically bounded by the number of such cycles in G(n, d + / n) 
(the standard 1-type binomial random graph). For each £, this number tends in distribution to a poisson variable with 
bounded mean (see e.g. Theorem 3.19 in lf22l ) and so the total number of such cycles is bounded w.h.p. Thus all the 
pairwise distances (in G — v) between neighbours of v are at least 21 + 1 w.h.p. (and in particular this proves the 
second part of the lemma). Therefore 


Pg [Xt+ i] 


(d++d_) d 
d\ exp(d+ + d-) 


d 

Cl ... C r 


nMc,) + o ( i). 

3=1 


By definition of T, we obtain E[Xt+i] = Pt.+i + o(l). To apply Chebyshev’s inequality, it remains to determine 
E[^t 2 j_i]- Let v. w £ [n] be two randomly choosen vertices. Then w.h.p. v and w have distance at least 2t + 3 in G, 
conditioned on which d t+1 (G, v, cr) and d t+1 (G, w, cr) are independent. Therefore we obtain 


(d t+1 (G, v, cr) = d t+l {T, r, r) A d t+1 (G, w, cr) = d t+1 (T, r, r)) 

= P„ ( d t+1 (G, v, a) “ 8 t+1 (T, r, r)) P w ( d t+1 (G, w,a ) “ d t+l (T, r, r)) + o(l) 


And finally 

EgK+i] = ^E G [X t+1 ]+E G [P„ (d t+1 (G,v,<r)^d t+1 (T,r,T))P w (d t+l {G, w, cr) d t+l (T, r, r))] +o(l) 
= E G [X t+1 ] 2 +o(l). 

The first assertion follows from Chebyshev’s inequality. □ 

2.4. The fixed point. Let (T, r, r) be a rooted tree together with a map r : V ( T ) -+ {±1}- Then for any pair v, w 
of adjacent vertices we have the WP messages p v ^ w {t\T, r), t > 0, as defined in (12.2b . Since we are going to be 
particularly interested in the messages directed towards the root, we introduce the following notation. Given the root 
r, any vertex v ^ r of T has a unique parent vertex w (the neighbour of v on the unique path from v to r). Initially, let 

/Vf(0| T,t,t) = t(v) (2.4) 


/i„ t (f|T,r,r) = AG-hu (f|T, r) 
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and define 


(2.5) 




( 2 . 6 ) 


for t > 0. In addition, set p r ^(0\T, r, r) = r(r) and let 


/i rt (f+ l|T,r,r) = 





(f >0) 


be the message that r would send to its parent if there was one. 

Forp = (p(— l),p(0),p(l)) £ V{{— 1,0,1}) we let p = (p(l),p(0),p(—1)). Remembering the map 

r = T d+ , d _ : V{{-1, 0,1}) 0,1}) 

from Section [L2l and writing T f for its /-fold iteration, we observe the following. 

Lemma 2.4. Letpt = T 4 (0, 0,1). 

(1) Given that t(tt) = +1, the message p rT ^(t\T, Tt, t) has distribution p t . 

(2) Given that t{tt ) = — 1, the message p rT f(t\T, Tt- t) has distribution p t . 

Proof. The proof is by induction on t. In the case t = 0 the assertion holds because p rT ^(0\T, rx, t) = t(tt). 
Now, assume that the assertion holds for t. To prove it for t + 1, let C± be the set of all children v of vt with 
r(rT)T(u) = ±1. By construction, \C±\ has distribution Po(d±). Furthermore, let (T v ,v,t v ) signify the subtree 
pending on a child v of tt- Because T is a Galton-Watson tree, the random subtrees T v are mutually independent. 
Moreover, each T v is distributed as a Galton-Watson tree with offspring matrix (12.3b and a root vertex of type ±t(tt ) 

for each v £ C±. Therefore, by induction the message p v ^(t\T v ,v,T v ) has distribution p t if r(u) = 1 resp. p t if 

t{v) = —1. As a consequence. 


AVrf(f + l| T,r T ,r) =tp \ ^ p v f(t\T v , v, t v ) + ^ p v f(t\T v ,v,T v ) 

v£C- 

has distribution pt+i if r(Yx) = 1 andp t _j_i otherwise. □ 

Lemma l24l shows that the operator T mimics WP on the Galton-Watson tree (T, Tt,t). Hence, to understand the 
behaviour of WP after a large enough number of iterations we need to investigate the fixed point to which T 4 (0,0,1) 
converges as t —> oo. In Section[4]we will establish the following. 

Proposition 2.5. The operator T has a unique skewed fixed point p* and lim^oo 7^(0, 0,1) = p*. 

Proof of Theorem [7,71 Consider the random variables 


1, 


1 1 


X n :=ybis(G), yP) := ±±^ £ 1 {p w ^ v (t\G,v) = -i>(p v (t\G,cr))}. 

uG[n] wf^dcv 

Then Lemma |2T1 and Proposition [22] huply that for any e > 0, 

l*n - Y^\ > e 


lim lim ] 

t—foo n—f oo 


= 0. 


(2.7) 


By Definition (12.2b . p, w ^ v (t\G 1 cr) and p v (t\G, cr) are determined by d^v and the initialisation p u ^. w (0\G, a) for 
all u,w £ d^v, {m,w} € E{G). Since (12.51 ) and (12.61 ) match the recursive definition (12.21 ) of p w ^ v (t\G, cr) and 
p v (t\G, cr), Lemma l273l imnlies that for any fixed t > 0 (as n tends to infinity), 


y(*) 


:= -E 
2 


^ l{p w f(t\T,r T ,r) = -i/)(jj, rT (t\T,r T ,T))} in probability. (2.8) 

Now let p* denote the unique skewed fixed point of T guaranteed by Proposition 12.51 Since each child of tt can 
be considered a root of an independent instance of T to which we can apply Lemma 12.41 we obtain that given 
( T(w)) W £dr T the sequence (p w ^(t\T, tt, T)) w ^dr T converges to a sequence of independent random variables ( rjw)wedr T 
with distribution p* (if t(w) = 1) andp* (if t(w) = —1). By definition p rT ( t\T , tt, t) converges to dr T t(w)=i r i w ~ 

J2 w &dr T x()— _i Vw- Considering the offspring distributions of T't in both cases, i.e. t(tt) = ±1, we obtain from 

Td + , d _(p) = <p d+ , d _(p) for all p £ 1}) that 

lim x (t) = ip d+ , d _(p*). 

t—y OO 

Finally, combining (12.7b — (12.9b completes the proof. 
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(2.9) 

□ 






3. Proof of Proposition ^. 21 


Lemma 3.1. Ifv £ C and w £ d G v, then p v ^, w {t\G, cr) = cr(v ) = p v ^ w {t\G, cr c ) for all t > 0. 

Proof. We proceed by induction on t. For / = 0 the assertion is immediate from the initialisation of the messages. To 
go from t to t + 1, consider v £ C and w £ d G v. We may assume without loss of generality that cr(v) = 1. By the 
definition of the WP message. 


where 


Pv-tw(t + l|G,cr) = ^ [ ^2 Pu-tv(t\G,cr) ] = ip(S+ + S- + S 0 ) 

^u£dGv\{w} 

S+ := ^ ^ (^|G, cr), 

tiGCn<T -1 (+l)n^Gi;\{if;} 


(3.1) 


S— . ^ ^ (t \G •) O') 1 

u^ccw- 1 {—i)c\dGv\{w} 

5 0 := Pu-tv(t\G,cr). 

u£dGv\{C\J{w}) 

Now, (12.11 ) ensures that 

S+ > d + — -\/d+lnd + , S- > —d_ — - d+ In d +, |So| < 100 < — \/d + In d +, (3.2) 

provided that the constant c > 0 is chosen large enough. Combining (13.1b and ( 13.2b , we see that S+ + S- + So > 1 
and thus p v ^ w {t + 1|G, cr) = 1. The exact same argument works for p v ^ w (t + 1|G, crc ) = 1- □ 


Let G v denote the subgraph of G induced on C„. To prove Pronosition 12.21 fix s > 0 large enough. Let S = Sis) 
be the set of all vertices such that either \C V \ > \fs or G, is cyclic. Then Lemma [2)71 (with slightly smaller e) and 
Lemma 1231 imply that |«S| < en w.h.p. For the rest of this section, let v S be fixed. 

For w £ C v \ {'(;} we let w^ v be the neighbour of w on the path from w to v. We define G w -> v as the component 
of w in the graph obtained from G, by removing the edge {w, wy v }. The vertex set of G w ^ v will be denoted 

by C.fy _ y.y. Further, h w ^ v is the maximum distance between w and any other vertex in G w ^ v . Additionally, h v is 

the maximum distance between v and any other vertex in G„. Finally, let cr v : C v —> {±1}, w i—> cr(w) and let 
cre.v ■ C v D C —> {±1}, w 1-4 <rc{w). 

Lemma 3.2. (1) For any w £ C v \ {u} and any t > h w ^ v we have 

Pw—tw^ v (f | G, cr) — Pw—tw^v (dm—tv T 11G, cr) — Pw—tw^ v (t\G,*c). 

(2) For any t > h v we have p v (t\G, cr) = p v (h v + 1 1 G, cr) = p v (t\G, crc). 


Proof. The proof of (1) proceeds by induction on h w ^ v . The construction C1-C2 of C v ensures that any w £ C v with 
h w ^ v = 0 either belongs to C or has no neighbour besides wy v . Hence for the first case the assumption follows from 
Lemma l3~T1 If d G w \ {tu-j-,,} = 0 we obtain that p w ^ w (t\G, cr) = p w ^ Wtv (t\G, crc) = 0 for all t > 1 by the 
definition of the WP messages. Now, assume that h w ^ v > 0 and let t > h w ^ v . Then all neighbours u ^ wy v of w in 
G w ^. v satisfy h u ^ v < h w ^ v . Thus, by induction 


(f | G, cr) — I ^ ^ p u —> w (t 11G, cr ) 

tu^dGw\{w^ v } 


— V 1 I ^ ^ Pu—tw (du—yv T 11G, cr) I — Pw—tw^ v {d w —^ v T 11G, cr). 

\ue9GuA{w-N} / 

An analogous argument applies to p w ^ w . v ( t\G , crc). The proof of (2) is similar. 


□ 


For each vertex w £ C v , w ^ v, let = Pw^w fv {s\G,cr). Further, let p^ = p w (s\G,cr). In addition, for 

2 : e {±1} let 


cr z w ^ v '■ C w -> v D ({m} U C) —> {±1} , 


u 1-4 



if u = w , 

otherwise. 
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In words, <j 


freezes wto; and all other u £ C w ^ v that belong to the core to rr(u). Analogously, let 


<r” : C v PI ({v} U C) —> {±1} , 



if u = v. 
otherwise. 


Lemma 3.3. Suppose that u £ C v \ {v}, such that h u ^ v > 1. 

(1) Ifz = € {-1,1}, then 

cut (G u —} V i(t u _^ v ') < cxlXj^G u —± v * (j u _^y). (3.3) 

Similarly, if z = tp(Py) £ { — 1 , 1 }, then 

cut(G„, <x~) < cut(G„, er“ 2 ). (3.4) 

(2) If p* u _^ v = 0, then 

c\it(G u —> v ,(r u _^ v ') — cut (G u —t v ,cr u _^ v '). (3.5) 

Similarly, if p* = 0, then 

cut(G„,cr+ 1 ) = cut(G„,cr“ 1 ). (3.6) 

Proof. We prove (13.3b and (13.51) by induction on h u -> v . If h u ^ v = 1 then we have that all neighbours w £ dc, I _ > ,,u of 
u with 0 are in C, i.e. fixed under cr 2 ^. Since = dcu \ U {u}, we obtain 


cut(C 


u-tv, U -M 


) — cut(C 


U->v, 


) = 


E 

w^0gu\{u^ v } 


(3.7) 


by definition of z. By the induction hypothesis and because G u _>„ is a tree (as v qL S) we have that ( 13.71 ) holds for 
h u ^ v > 1 as well. A similar argument yields CH and OB- □ 


Now, let U v be the set of all w £ C v such that p^^y ^ 0. Furthermore, let 

4>(lC) ifw = v, 
Pw-> v otherwise. 


’"{v 


: IA V U {u} —t {—1, +1} , w i—> 


Thus, cry v sets all w £ C v D C \ {u} to their planted sign and all w £ U v \ C to p^^ v . Moreover, cry v sets v to ip(py) 
if f>(py) ^ 0 and to 1 if there is a tie. 


Corollary 3.4. We have cut(G„, er c ) = cut(G„, cr^). 
Proof This is immediate from Lemma [3731 


□ 


Hence, in order to determine an optimal cut of G v we merely need to figure out the assignment of the vertices in 
Cv \ ({it} UW„). Suppose that erf : C v —> {±1} is an optimal extension of <jy v to a cut of G v , i.e., 

cut(G„, <Tfv) = E 

{u,w}£E(G v ) 


Corollary 3.5. It holds that E,„ e a G ,; K 1 “ = E™ e a G „ = -4>(Pv)}- 

Proof Part (2) of Lemma 1331 implies that <Ty^(v)<r%^(w) = 1 for all w £ dev such that p^^y = 0. □ 

Proof of Proposition ^. 2\ Given e > 0 choose 8 = 8(e,d+,d-) sufficiently small and s = s(e,S,d+,d-) > 0 
sufficiently large. In particular, pick s large enough so that 

IP(|<5| > Sn) < e. 

Provided that 8 is suitable small, the Chernoff bound implies that for large n 


(3.8) 





Now, suppose that <r* c is an optimal extension of rr c to a cut of G and let v (f_ S. Then using the definition of C v , 
Corollary [T4] implies that 


( 1 _,T cW ff cW)= ( X ~ 

wGdc-v wEdcv 


Therefore, we obtain 

cut(G, ^ 

vgS w^dcv 

The assertion follows from Lemma [3T2l for t > s. 


> en J < P I ^2 \®gv\ > 
) \ veS 


en < 2 e. 


□ 


4. Proof of Proposition [23] 

We continue to denote the set of probability measures on X C R fc by P(X). For a A'’-valued random variable X we 
denote by C(X) £ 'P(X) the distribution of X. Furthermore, if p. q £ P(X), then P pq (X) denotes the set of all 
probability measures p on X x X such that the marginal distribution of the first (resp. second) component coincides 
withp (resp. q). The space P( { — \. 0,1}) is complete with respect to (any and in particular) the L \ -Wasserstein metric, 
defined by 

£i(p, q) = inf {E|X — Y| : X, Y are random variables with C{X, Y) £ P p . q {{— 1, 0,1})} . 

In words, the infimum of E|X — Y\ is over all couplings (X , Y) of the distributions p. q. Such a coupling (X , Y) 
is optimal if l\ (p,q) = E|X — Y |. Finally, let P*{{— 1,0,1}) be the set of all skewed probability measures on 
{—1, 0,1}. Being a closed subset of P{{— 1,0,1}), P*({— 1, 0,1}) is complete with respect to £i( •, • 

As in the definition (1 1 .2b - (l 1 .3b of the operator T = Td + ,d- forp £ P({— 1, 0,1}) we let (t] p ,i)i>i be a family of 
independent random variables with distribution p. Further, let = Po (d±) be independent of each other and of the 
the (rip,i)i> i. We introduce the shorthands 

7+ 7++7- 

^p,-\- — ^ ] bp - i i Yp,— ^ ] VpP so that Zp ^p,-\- ■ 

i —1 i=7_|_+l 

Also set A = C\Jd+ In d + and recall that c > 0 is a constant that we assume to be sufficiently large. 

Lemma 4.1. The operatorT mapsP*({— 1,0,1}) into itself 


Proof. Suppose that p £ P{{— 1,0,1}) is skewed. Then 


'{Z p < 1)< 


^p.+ — d^- 


A- 1 


A- 1 


+ P [ Yp,~ A d— + ^ 


(4.1) 


Since \r) p A < 1 for all i, we can bound the second summand from above by invoking the Chernoff bound to obtain 


(V > d- + ^d+\nd+~- ^ < ^+ 10 , 


(4.2) 


provided c is large enough. To bound the other summand from above we use that (ji Pt i)i>i is a sequence of independent 
skewed random variables, whence by the Chernoff bound 




A- 1 


< P (|7+ — d+\ > A/8) + P ( Z p — < d_|_ — 


A- 1 


7 +>d+- A/8 


< id; 10 + P [Bin(d+ - A/8,1 - df 10 ) < d+ - A/7] < 

o o 


provided that c is sufficiently big. Combining (14. lb — (14.3b completes the proof. 


(4.3) 

□ 


Lemma 4.2. The operatorP is l\-contracting on P* ({ — 1.0,1}). 
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Proof. Letp, q G V*{{— 1,0,1}). We aim to show that l\ (T(p),T(q)) < \l\(p, q). To this end, we let (rj p ,i, q q ,i)i >i 
be a family of random variables with distribution p resp. q such that (r/ JJ ,) ) ;> i are independent and are 

independent but such that the pair ij q l ) is an optimal coupling for every i. Then by the definition of f\ (■. •), 

h(T{p),T(q)) < E| ^(Z p ) - f,{Z q ) |. (4.4) 


To estimate the r.h.s., let fj Pt i = l{r] Pt i = 1}, fjq t i = l{p q ,i = 1}. Further, let be the cr-algebra generated by 
Pp,iiVq,i and let $ be the o- algebra generated by 7 +, 7 - and the random variables (fj p ,i,fj q ,i)i> 1 . Additionally, let 
7 = 7 + + 7 - and consider the three events 

%i = jy~] Vp,iVq,i > 7 ~ 10 | 1 212 = {7 > 2 d+}, 2l 3 = { 7 + - 7 - < 20} . 

We are going to bound \ f>{Z p ) — f(Z q )\ on 21 1 \ ( 2 I 2 U 2l 3 ), 2li U 2 I 2 U 2l 3 , 2 I 2 and 2t 3 \ 2l 2 separately. The bound on 
the first event is immediate: if 2li \ ( 2 I 2 U 2l 3 ) occurs, then ip(Z p ) = ip{Z q ) = 1 with certainty. Hence, 

E [\ijj(Z p ) — ip(Z q )\ ■ lot 1 \( 2 i 2 ua 3 )] = 0- (4-5) 


Let us turn to the second event 2li U 2 I 2 U 2l 3 . Because the pairs (r/ TM , r) q ,i)i>i are mutually independent, we find 


K l\Vp,i ~ Vq,iWS} = E [\Vp,i ~ Vq,i\\ &] for a11 * > 1- 
Clearly, if fj p ^r) q ^ = 1, then rj p ^ — r] q ^ = 0. Consequently, 

E|?7 p ,j - rj qii \ _ E\r) Ptl - r] q s 

n P 


EMtj — n ,11 7-1 < ' lp ' 1 ~ ] ~ lqAl 

P '‘ q ’ ' ' P[Vp,ifjq,i = 0] P[fjp,lVq,l = 0] ' 

Since the events 21 1 , 2 I 2 ,21 3 are ^-measurable and because 21 2 ensures that 7 < 2 d+, (14.61 ) and (14.71 ) yield 


mWZp) - VWI I5]l ai ua 2 ua 3 < 2 ^ E| ^' 1 ^ • 1, 
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P[fj p , ifj q ,i = 0\ st 1 ua 2 ua 3 - 

Further, because the pairs (r/ Pi j, rj q ^)i> 1 are independent and because p, q are skewed, 

P (2li U 2l 2 U 2l 3 ) < P ^7 < 2 :d + , Pp.iVq.i < 7 - 10^ < (2d+ P (r? p ,i? 7 g ,i = 0)) 

Combining (14.81) and ( 14.9b . we obtain 

E [E[| ip{Z p ) - %j){Z q ) ||3] l , aiU2t2uai3 ] < {2d + ) n F(fj pA fj q ^ = 0 ) 9 E|? 7 p ,i - r) q ,i\. 

Since p, q are skewed, we furthermore obtain P (fj p ^fj q ,i = 0) < 2<i7 10 . Therefore 

E [| f>{Zp) — fH-^<?)|l aiU 2i 2U 2i 3 ] = E [E [| f>{Z p ) — f’iZq) |U] l ai ua 2 ua 3 ] — 2 E|»7 P) i — %,i| 

With respect to 2 I 2 , the triangle inequality yields 


(4.6) 


(4.7) 


(4.8) 


(4.9) 


(4.10) 


E[\ijj(z p ) - t/j(Zq)\l^ 2 \ < 2E|r? P) i - T) q>1 \■ E[ 7 la 2 ]. (4.11) 

Further, since 7 = Po(d+ + d_), the Chernoff bound entails that E[ 7 l>a 2 ] < df 1 if the constant c is chosen large 
enough. Combining this estimate with (14.1 1 b . we get 

E[\ip(Zp) - ip(Zq) |la 2 ] < 2d+ 1 E|?7 Pi i - (4.12) 

Finally, on 2l 3 \ 2 I 2 we have 

E[|V , (^ P ) - V'(^?)|la 3 \a 2 ] < 4d+E|7 ?P) i -? 7 9 j i|P[ 7 + - 7 _ < 20]. (4.13) 

Since = Po(d±) and d+ — d- > A, the Chernoff bound yields P [ 7 + — 7 _ < 20] < dT , if c is large enough. 
Hence, (14.131) implies 

E[| i>{Zp) - V ; (^?)|l 2 i 3 \a 2 ] < 4d+ 1 E|77 Pi i - (4.14) 

Finally, the assertion follows from (14.4b . (14.5b . (14. 1 0b . (14.121) and (14.141) . □ 


Proof of Provosition \2.5\ The assertion follows from Le m m as 14. 1 1 and 14.21 a n d the Banach fixed point theorem. □ 
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